Deduplication and Group Detection using Links

نویسندگان

  • Indrajit Bhattacharya
  • Lise Getoor
چکیده

!" #%$ & ' ( )+*, .# / .021"3& 4 5 "#% 6$/ 7 8 #% 9 : ; 7 . <09# = : 6 77 $ ?> & 5# , : ? . & # $@ A3& B 8 =)DCE# ? &3& & 2 .021 : F: G$@ & & H 8 I 2 G H3& E & #% / 7 3 & F I ")KJ6 L & M & N : K I ? D$@ " & 73 $@ 8 F K ! :,1 OP: 3 : : K$@# :G & 2 . & &1Q " & 73 $@ 8 G$20 P3&#% &3 # R# 6 7 7 $ " & &1K S T 4$@ B UO & R & 2 . &1K ! & 7 & 2 : ( L I #% $@ B UO & R : & V)S S R : & 7 ( 73& & #% $@ 3&#% & # ?3 #% ! WX1" O A 7: #% ( 7#4 TQ ? 2 #F 3&3&# 2 : P 7 . <0 # X : P & 2 . U04 T KOP: & FO 6 ;3& B )ZYS !" #%!@# 7 M #I>Q Z 3& 8 7 " & 6 # 3& ( TQ 8 [ I "1 / F 7: #IO\: #8O] : 0438 $@ P 7 8 M # 7#% >Q P UO # !@# 7 2 4 7T" &1 & 2 . U0( & ! 3& #% [ / E #% !V 73&#I>Q B 702) Categories and Subject Descriptors ^ ) _") _ ` aBbdc<e f%gih j8k e/bml=j e/fIh n/o4h/bNp[q o j&f k o r h sLtvuNw< x# 9 #% y 8 3 :R / Hz; 7 &> {}|&~ "€ <‚Bƒ „x… †/‡ ^ ) ˆ ) ‰S` Š(h j&h ‹Nh/Œ&o]Žh b, h n/o gHo"b=jBtu+‘A $ 7 ’6! ! 38 # U{”“ • •M–—„x… „x… †

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cluster Based Duplicate Detection

We propose a clustering technique for entropy based text dis-similarity calculation of de-duplication system. Improve the quality of grouping; in this study we propose a Multi-Level Group Detection (MLGD) algorithm which produces a most accurate group with most closely related object using Alternative Decision Tree (ADT) technique. Our propose a two new algorithm; first one is Multi-Level Group...

متن کامل

Cryptographic Hashing Method using for Secure and Similarity Detection in Distributed Cloud Data

Received Jun 29, 2017 Revised Nov 23, 2017 Accepted Dec 17, 2017 The explosive increase of data brings new challenges to the data storage and supervision in cloud settings. These data typically have to be processed in an appropriate fashion in the cloud. Thus, any improved latency may origin animmense loss to the enterprises. Duplication detection plays a very main role in data management. Data...

متن کامل

Data Deduplication Report

The production of data is expanding at an astonishing pace. Data are exploding as companies and organizations collect and store increasing amounts of information. The huge amount of data require more storage, processing power and network bandwidth. To address this problem, data deduplication is being widely used. Hashing is widely used in data deduplication systems. Because hashing has many adv...

متن کامل

Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage

In a virtualized cloud cluster, frequent snapshot backup of virtual disks improves hosting reliability; however, it takes significant memory resource to detect and remove duplicated content blocks among snapshots. This paper presents a low-cost deduplication solution scalable for a large number of virtual machines. The key idea is to separate duplicate detection from the actual storage backup i...

متن کامل

Ddup - towards a deduplication framework utilising apache spark

This paper is about a new framework called DeduPlication (DduP). DduP aims to solve large scale deduplication problems on arbitrary data tuples. DduP tries to bridge the gap between big data, high performance and duplicate detection. At the moment a first prototype exists but the overall project status is work in progress. DduP utilises the promising successor of Apache Hadoop MapReduce [Had14]...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004